<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html><head><title>Programming in Lua : 11.6</title>
<link rel="stylesheet" type="text/css" href="pil.css">
</head>
<body>
<p class="warning">
<A HREF="contents.html"><IMG TITLE="Programming in Lua (first edition)" SRC="capa.jpg" ALT="" ALIGN="left"></A>This first edition was written for Lua 5.0. While still largely relevant for later versions, there are some differences.<BR>The third edition targets Lua 5.2 and is available at <A HREF="http://www.amazon.com/exec/obidos/ASIN/859037985X/theprogrammil3-20">Amazon</A> and other bookstores.<BR>By buying the book, you also help to <A HREF="../donations.html">support the Lua project</A>.
</p>
<table width="100%" class="nav">
<tr>
<td width="10%" align="left"><a href="11.5.html"><img border="0" src="left.png" alt="Previous"></a></td>
<td width="80%" align="center"><a href="contents.html"><font face="Helvetica,Arial,sanserif">
<font color="gray">Programming in </font><font color="blue"> Lua</font>
</font></a></td>
<td width="10%" align="right"><a href="12.html"><img border="0" src="right.png" alt="Next"></a></td>
</tr>
<tr>
<td width="10%" align="left"></td>
<td width="80%" align="center"><a href="contents.html#P2">Part II. Tables and Objects</a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<a href="contents.html#11">Chapter 11. Data Structures</a></td>
<td width="10%" align="right"></td></tr>
</table>
<hr/>
<p><a name="StringBuffer"><h2>11.6 &ndash; String Buffers</h2></a>

<p>


<p>Suppose you are building a string piecemeal,
for instance reading a file line by line.
Your typical code would look like this:
<pre>
    -- WARNING: bad code ahead!!
    local buff = ""
    for line in io.lines() do
    buff = buff .. line .. "\n"
    end
</pre>
Despite its innocent look,
that code in Lua can cause a huge performance penalty for large files:
For instance,
it takes almost a minute to read a 350 KB file.
(That is why Lua provides the <code>io.read("*all")</code> option,
which reads the whole file in 0.02 seconds.)

<p>Why is that?
Lua uses a true garbage-collection algorithm;
when it detects that the program is using too much memory,
it goes through all its data structures and frees
those structures that are not in use anymore (the garbage).
Usually this algorithm has a good performance
(it is not by chance that Lua is so fast),
but the above loop takes the worst of the algorithm.

<p>To understand what happens,
let us assume that we are in the middle of the read loop;
<code>buff</code> is already a string with 50 KB
and each line has 20 bytes.
When Lua concatenates <code>buff..line.."\n"</code>,
it creates a new string with 50,020 bytes
and copies 50 KB from <code>buff</code> into this new string.
That is, for each new line,
Lua moves 50 KB of memory, and growing.
After reading 100 new lines (only 2 KB),
Lua has already moved more than 5 MB of memory.
To make things worse,
after the assignment
<pre>
        buff = buff .. line .. "\n"
</pre>
the old string is now garbage.
After two loop cycles,
there are two old strings making a
total of more than 100 KB of garbage.
So, Lua decides, quite correctly,
that it is a good time to run its garbage collector
and so it frees those 100 KB.
The problem is that this will happen every two cycles
and so Lua will run its garbage collector
two thousand times before reading the whole file.
Even with all this work,
its memory usage will be approximately three times the file size.

<p>This problem is not peculiar to Lua:
Other languages with true garbage collection,
and where strings are immutable objects,
present a similar behavior,
Java being the most famous example.
(Java offers the structure <code>StringBuffer</code> to
ameliorate the problem.)

<p>Before we continue, we should remark that,
despite all I said, that situation is not a common problem.
For small strings, the above loop is OK.
To read a whole file, we use the <code>"*all"</code> option,
which reads it at once.
However, sometimes there are no simple solutions.
Then, the only solution is a more efficient algorithm.
Here we show one.

<p>Our original loop took a linear approach to the problem,
concatenating small strings one by one into the accumulator.
This new algorithm avoids this,
using a binary approach instead.
It concatenates several small strings among them
and, occasionally, it concatenates the resulting large strings
into larger ones.
The heart of the algorithm is a stack
that keeps the large strings already created in its bottom,
while small strings enter through the top.
The main invariant of this stack is similar to that of
the popular (among programmers, at least) <em>Tower of Hanoi</em>:
A string in the stack can never sit over a shorter string.
Whenever a new string is pushed over a shorter one,
then (and only then) the algorithm concatenates both.
This concatenation creates a larger string,
which now may be larger than its neighbor in the previous floor.
If that happens, they are joined too.
Those concatenations go down the stack until the loop reaches
a larger string or the stack bottom.
<pre>
    function newStack ()
      return {""}   -- starts with an empty string
    end
    
    function addString (stack, s)
      table.insert(stack, s)    -- push 's' into the the stack
      for i=table.getn(stack)-1, 1, -1 do
        if string.len(stack[i]) > string.len(stack[i+1]) then
          break
        end
        stack[i] = stack[i] .. table.remove(stack)
      end
    end
</pre>
To get the final contents of the buffer,
we just need to concatenate all strings down to the bottom.
The <code>table.concat</code> function does exactly that:
It concatenates all strings of a list.


<p>Using this new data structure,
we can rewrite our program as follows:
<pre>
    local s = newStack()
    for line in io.lines() do
      addString(s, line .. "\n")
    end
    s = toString(s)
</pre>
This new program reduces our original time
to read a 350 KB file from 40 seconds to 0.5 seconds.
The call <code>io.read("*all")</code> is still faster,
finishing the job in 0.02 seconds.

<p>Actually, when we call <code>io.read("*all")</code>,
<code>io.read</code> uses exactly the data structure that we presented here,
but implemented in C.
Several other functions in the Lua
libraries also use this C implementation.
One of these functions is <code>table.concat</code>.
With <code>concat</code>, we can simply collect all strings in a table
and then concatenate all of them at once.
Because <code>concat</code> uses the C implementation,
it is efficient even for huge strings.

<p>The <code>concat</code> function accepts an optional second argument,
which is a separator to be inserted between the strings.
Using this separator, we do not need to insert a newline after
each line:
<pre>
    local t = {}
    for line in io.lines() do
      table.insert(t, line)
    end
    s = table.concat(t, "\n") .. "\n"
</pre>
(The <code>io.lines</code> iterator returns each line without the newline.)
<code>concat</code> inserts the separator between the strings,
but not after the last one,
so we have to add the last newline.
This last concatenation duplicates the resulting string,
which can be quite big.
There is no option to make <code>concat</code> insert this extra separator,
but we can deceive it,
inserting an extra empty string in <code>t</code>:
<pre>
    table.insert(t, "")
    s = table.concat(t, "\n")
</pre>
The extra newline that <code>concat</code> adds before this empty string
is at the end of the resulting string, as we wanted.

<p>
<hr/>
<table width="100%" class="nav">
<tr valign="top">
<td width="90%" align="left">
<small>
  Copyright &copy; 2003&ndash;2004 Roberto Ierusalimschy.  All rights reserved.
</small>
</td>
<td width="10%" align="right"><a href="12.html"><img border="0" src="right.png" alt="Next"></a></td>
</tr>
</table>


</body></html>

